A nomogram for predicting cancer-specific survival in patients with uterine clear cell carcinoma: a population-based study

Uterine clear cell carcinoma (UCCC) is a relatively rare endometrial cancer. There is limited information on its prognosis. This study aimed to develop a predictive model predicting the cancer-specific survival (CSS) of UCCC patients based on data from the Surveillance, Epidemiology, and End Results (SEER) database between 2000 and 2018. A total of 2329 patients initially diagnosed with UCCC were included in this study. Patients were randomized into training and validation cohorts (7:3). Multivariate Cox regression analysis identified that age, tumor size, SEER stage, surgery, number of lymph nodes detected, lymph node metastasis, radiotherapy and chemotherapy were independent prognostic factors for CSS. Based on these factors, a nomogram for predicting the prognosis of UCCC patients was constructed. The nomogram was validated using concordance index (C-index), calibration curves, and decision curve analyses (DCA). The C-index of the nomograms in the training and validation sets are 0.778 and 0.765, respectively. Calibration curves showed good consistency of CSS between actual observations and nomogram predictions, and DCA showed that the nomogram has great clinical utility. In conclusion, a prognostic nomogram was firstly established for predicting the CSS of UCCC patients, which can help clinicians make personalized prognostic predictions and provide accurate treatment recommendations.


Scientific Reports
| (2023) 13:9231 | https://doi.org/10.1038/s41598-023-36323-w www.nature.com/scientificreports/ nomogram has been developed for the prognosis of UCCC patients. The purpose of this study was to construct a nomogram using UCCC patient data extracted from the Surveillance, Epidemiology, and End Results (SEER) database and then validate the predictive model to determine its performance.

Results
Patient characteristics. A total of 2329 patients were finally included and randomly divided into a training cohort of 1591 and a validation cohort of 738. The data selection flow chart is shown in Fig. 1. For continuous variables, the optimal cut-off value was determined by X-Tile software, which was converted to categorical variables. Among them, the optimal cut-off values for age were 60 and 70 years, respectively, the optimal cut-off values for tumor diameter were 30 and 70 mm, and the number of detected lymph nodes was 2 and 9, respectively. The clinicopathological characteristics of the training cohort and the validation cohort are shown in Table 1, and there was no significant difference between the two groups (p > 0.05). Median follow-up was 56 months (range 1-227 months). During this period, 853 (36.6%) cancer-specific deaths occurred, and the cumulative 5-and 10-year CSS for the entire cohort were 58.8% and 54.8%, respectively.
Construction of the nomogram. For the training set, the Cox univariate analysis showed that the following factors were significantly associated with CSS: age, race, marital status, tumor size, pathological grade, SEER stage, AJCC stage, surgery, number of lymph nodes detected, lymph node metastasis, radiotherapy and chemotherapy (all p < 0.05). The Cox multivariate regression analysis showed that age, tumor size, SEER stage, surgery, number of lymph nodes detected, lymph node metastasis, radiotherapy and chemotherapy were independent prognostic factors for CSS (Table 2). In the multi-collinearity analysis performed among these variables, all VIFs were less than 2 (data not shown). This result revealed that there was no multi-collinearity between these variables. According to the above clinicopathological factors, a personalized nomogram for predicting the prognosis of UCCC patients was successfully constructed, and SEER stage had the greatest impact on the prognosis of UCCC patients. After the clinician entered the clinicopathological information of a specific UCCC patient into the nomogram, the corresponding score on the scoring scale was obtained, and the obtained score was added to the total subscale. Finally, drawing a vertical line on the survival scale gives the patient's 5-and 10-year probability of survival (Fig. 2).
Validation of the nomogram. The C-index of the nomogram in the training set and validation set is 0.778 (95% CI 0.758-0.798) and 0.765 (95% CI 0.743-0.787), respectively, indicating that the nomogram has good prediction accuracy. Calibration curve analysis showed that the survival rate predicted by the nomogram was in good agreement with the actual survival rate, indicating that the nomogram had better predictive performance (Fig. 3). DCA showed that at nearly all threshold probabilities, using the established nomogram for predicting outcomes in UCCC patients provided a greater net benefit than the "all or zero deaths in all patients" strategy, suggesting that the nomogram has potential clinical applicability. Furthermore, DCA showed that the nomogram model curve was higher than the SEER stage curve, indicating that the nomogram model was superior to the SERR staging system (Fig. 4).

Discussion
In this study, we developed a nomogram for predicting CSS in UCCC patients based on eight predictors of patient's age, tumor size, SEER stage, surgery, number of lymph nodes detected, lymph node metastasis, radiotherapy and chemotherapy. The predictors included in the model can be easily obtained from clinical practice. Validation of the model using different statistical methods demonstrates its excellent performance. Furthermore, DCA demonstrated that our nomogram predicted survival with better clinical benefit and utility than the conventional staging system. UCCC is rare and considered to be prone to myometrial invasion, lymphovascular invasion, lymph node metastasis and extrauterine metastasis, so most of them were diagnosed at a later stage. Due to its rarity, there are few studies on UCCC, and these studies are usually single-center, small-sample studies 13,22-24 , thus there is currently a lack of high-quality evidence-based evidence on its biological characteristics, optimal treatment options, and prognostic assessment. At present, in clinical practice, obstetricians and gynecologists often evaluate the prognosis of UCCC patients and formulate follow-up treatment plans according to the patient's AJCC or FIGO stage, pathological grading, and intraoperative conditions 9,14,15 . However, this method mostly relies on the clinical experience of physicians, and cannot conduct a more comprehensive survival analysis and prognosis www.nature.com/scientificreports/ evaluation according to the patient's disease characteristics. Therefore, a more systematic diagnosis and treatment plan and prognostic risk assessment for UCCC are urgently needed. Previous studies indicated that age, tumor size and pathologic stage might be important factors affecting the prognosis of UCCC [22][23][24] . However, due to the small number of cases in these studies, the conclusions are inconsistent. In this study, based on national data from a relatively large cohort, our study found that age, tumor size, SEER stage, surgery, number of lymph nodes detected, lymph node metastasis, radiotherapy and chemotherapy were significantly correlated with the prognosis of UCCC. Among them, SEER stage is the most important factor affecting the prognosis of patients. The higher the SEER stage, the worse the prognosis of the patient. Surgery is the second important factor on the survival rate of UCCC patients based on the nomogram. Currently, total hysterectomy plus bilateral adnexectomy plus pelvic and para-aortic lymph node dissection have been established as first-line treatment 23,25 . This comprehensive staging surgery can better perform accurate staging and provide a reference for subsequent selection of appropriate adjuvant therapy. Lymph node metastasis is one of the main factors affecting the prognosis of patients with endometrial cancer. However, the effect of lymphadenectomy on the survival of UCCC patients remains controversial. In many studies, systematic lymph node dissection has resulted in better outcomes for patients with UCCC 14,26 . Conversely, other studies have shown that lymphadenectomy has no prognostic value 2,22 . One reason for this discrepancy may be that the number of lymph nodes dissected was not taken into account. Our study observed that patients with more than 9 lymph nodes removed had better CSS than those with < 2 lymph nodes removed. However, due to the lack of information on the extent of lymph node dissection, we can't compare the effects of systematic lymphadenectomy with less extensive lymphadenectomy (such as sentinel lymph node dissection or sampling) on the prognosis, which needs further improvement in future research. Age is an independent prognostic factor for UCCC, which is consistent with previous studies 23,27 . In addition, multivariate analysis showed that radiotherapy and chemotherapy were also protective factors affecting the prognosis of UCCC patients. This is the first study to established a prognostic model for UCCC. Based on the SEER database system, this study integrated the relevant clinicopathological factors and treatment patterns affecting the prognosis of UCCC patients into a nomogram, thereby successfully constructing a predictive model consistent with the condition of UCCC patients. Compared with the SEER staging system (surrogate for traditional FIGO staging), it has the advantages of being comprehensive, intuitive, more accurate and convenient. The multi-center large sample also provides a guarantee for the credibility of the final model. This study has several limitations. First, SEER database lacks detailed information about chemotherapy and radiotherapy, and there is no data about surgical margins, extent of pelvic lymph node dissection, and lymph node invasion, which may affect the prognosis of UCCC. Second, the nomogram model is only verified internally. It is necessary to use cohort and prospective randomized clinical trials from other countries for external verification to confirm its performance. Third, there may be selection bias due to the nature of retrospective analysis.
In conclusion, we developed a nomogram for predicting CSS in UCCC patients based on the SEER database, which can help clinicians make individualized prognosis predictions and provide accurate treatment recommendations. Extracted data included: gender, age, race, marital status, tumor location, tumor size, year of diagnosis, pathological grade, SEER stage, AJCC TNM staging (7th edition), surgery, chemoradiotherapy, follow-up time and survival. The SEER stage (local, regional, and distant) was used to classify the extent of the disease as a surrogate for the traditional FIGO staging. The primary endpoint of the study was cancer-specific survival (CSS), defined as the time from diagnosis to death from UCCC or time to last follow-up. The optimal cutoff values for continuous variables were determined using the "X-Tile" software (Yale School of Medicine, CT, USA), converting age, tumor size, number of lymph nodes dissected into categorical variables. Statistical analysis. The final included UCCC patients were randomly assigned to the training set and the validation set in a 7:3 ratio using R software. The training set was used to build a risk prediction model and to construct a nomogram to predict a patient's CSS at 5 and 10 years. Validation groups are used for internal validation. For comparison of count data between groups, chi-square or Fisher's exact test is used; for comparison of multi-category variables between groups, chi-square test or Fisher's exact probability method for R*C tables is used. Continuous variables were compared using the t-test or the Mann-Whitney U test. In the training group, univariate and multivariate analyses were performed by Cox proportional hazards regression models to identify independent prognostic factors associated with CSS. The patient characteristics with p < 0.05 in univariate analysis were included in multivariate analysis. A nomogram model was constructed based on the independent prognostic factors defined in the multivariate analysis. Meanwhile, the variance inflation factor (VIF) was assessed among the covariates in the nomogram, and VIF > 4.0 was interpreted as indicating multicollinearity. Variables with VIF greater than 4.0 were not included in the final model analysis. The discrimination and consistency of the model were evaluated by the consistency index (C index) and the calibration curve (1000 cycles by the bootstrap method). The larger the C index, the more accurate the prognosis prediction. Calibration curves are used to describe the difference between predicted probabilities and actual outcomes. The x-axis represents Scientific Reports | (2023) 13:9231 | https://doi.org/10.1038/s41598-023-36323-w www.nature.com/scientificreports/ predicted survival time and the y-axis represents actual survival time. In a perfect forecasting model, the forecast rate would decline along a 45° slope. The clinical utility of nomograms was assessed by applying decision curve analysis (DCA) to calculate the net gain over a range of threshold probabilities. The y-axis represents net gain and the x-axis represents threshold. All statistical analyses were performed using R software version 4.1.3. P < 0.05 means the difference is statistically significant.

Methods
Ethics approval and consent to participate. Approval was waived by the local ethics committee, as SEER data is publicly available and de-identified.

Data availability
The data that support the findings of this study are openly available in software package SEER*Stat 8.4.0.1 (https:// seer. cancer. gov/ seers tat/).